Fastload in Teradata

Posted on May 22, 2020 in Computer Science

Things under legendu.net/outdated are outdated technologies that the author does not plan to update any more. Please look for better alternatives.

** Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement! **

Insert Multiple Rows

Teradata SQL does not support VALUES with multiple rows in a INSERT statement.

  1. Multi Statement Request (MSR) + Teradata BTEQ You can run a MSR (with multiple insert statements) BTEQ, which submit all insert statements as one block after the final semicolon. Note: when thereis a new command starting on the same line, it is part of the MSR.

    INSERT INTO Example Values('Steve',4)
    ;INSERT INTO Example Values('James',8)
    ;
    

    In Teradata SQL Assistant or Teradata SQL Studio, they shortcut is F9 (instead of F5).

Performance

@TODO: convert the Java example to Kotlin code in JupyterLab notebook.

  1. Prepared/Parameterized Statement is recommended to improve performance of query/update/insert. Below is an example of using JDBC in Java.

    // These are done once 
    Connection conn = DriverManager.getConnection(url, username, password);
    String sql = "INSERT INTO transactions (cust_id, transaction_date, amount, desc) values (?, ?, ?, ?)";
    
    PreparedStatement ps = conn.prepareStatement(sql);
    
    // the below can be repeated many times with different values
    ps.setInt(1, custID);
    ps.setDate(2, tran_date);
    ps.setBigDecimal(3, amount);
    ps.setString(4, desc);
    ps.executeUpdate();
    

    In a DB-API compatible Python module, you call use the method execute on a parameterized query with a list/tuple of parameters.

    sql = "INSERT INTO transactions (cust_id, transaction_date, amount, desc) values (?, ?, ?, ?)"
    cursor.execute(sql, [cust_id, tran_dt, amount, desc])
    

    Prepared Statement Batches take your performance to the next level. In addition to the benefits of reusing the Prepared Statement, batching your input values also reduces the number of round trips to the database. A batch size of roughly 5,000 to 10,000 works well for most applications. Using batches can be 10 to 40 times faster than the previous approach. Below is an example of using JDBC in Java.

    Connection conn = DriverManager.getConnection(url, username, password);
    String sql = "INSERT INTO transactions(cust_id, transaction_date, amount, desc) values (?, ?, ?, ?)";
    PreparedStatement ps = conn.prepareStatement(sql);
    for ( /* Loop through input values */ ) {
        for ( /* Loop through a subset of the input values - the desired batch size */ ) {
            ps.setInt(1, custID);
            ps.setDate(2, tran_date);
            ps.setBigDecimal(3, amount);
            ps.setString(4, desc);
            ps.addBatch(); // adds the row of input values to the batch
        }
        // This is done once per the desired batch size.
        ps.executeBatch(); // sends all the batched rows to the database
    }
    

    In a DB-API compatible Python module, you call use the method executemany on a parameterized query with a list/tuple of list/tuple of parameters.

    sql = "INSERT INTO transactions (cust_id, transaction_date, amount, desc) values (?, ?, ?, ?)"
    cursor.execute(sql, [
        [cust_id_1, tran_dt_1, amount_1, desc_1],
        [cust_id_2, tran_dt_2, amount_2, desc_2],
        [cust_id_3, tran_dt_3, amount_3, desc_3],
    ])
    

    For loading truly huge amounts of data, JDBC FastLoad can provide even better performance. There are a couple of caveats, however. JDBC FastLoad can only insert data into an empty table, and JDBC FastLoad is only recommended for loading large amounts of data -- at least 100,000 rows total. The nice thing is that your Java code doesn't need to change in order to use JDBC FastLoad. Your application uses the exact same Prepared Statement batches as in the previous example. Just add TYPE=FASTLOAD to your connection parameters, and the Teradata JDBC Driver will use JDBC FastLoad for particular SQL requests, if it can. Note that the recommended batch size for JDBC FastLoad is much higher than for a regular SQL Prepared Statement batch, which means you may need to increase your JVM heap size. To get top-notch performance, you need to use a batch size of roughly 50,000 to 100,000. Using JDBC FastLoad can be 3 to 10 times faster than the previous approach. Please refer to JDBC FastLoad for more discussions on JDBC FastLoad.

FastLoad

  1. If you have the Teradata ODBC driver installed, you can access the fastload commaand by installing Teradata Tools & Utils. If you use JDBC, it seems that you must write code yourself.

  2. Use temporary tables when you use fastload to load data, as the tables involved might get blocked if issues happen in the middle.

  3. Make sure to set a large enough CHECKPOINT for good performance. Frequent checkpointing option reduces the speed of your FastLoad job. But it substantially enhances FastLoad restart operations.

    • Each checkpoint temporarily halts the multiple session data transfer feature of FastLoad, thereby decreasing the speed of the FastLoad job.
    • The record size and the size of the Teradata Database influence how often you should specify checkpoints.On a smaller Teradata Database, specify checkpoints:
    • Every 50,000 records if each record is more then 4 KB
    • Every 100,000 records if each record is less than 4 KB On a larger Teradata Database, specify higher (less frequent) checkpoint values.

http://teradatatrainingbysatish.blogspot.com/2016/01/teradata-sample-scripts-bteq-fastload.html

https://www.tutorialspoint.com/teradata/teradata_fastload.htm

https://docs.teradata.com/reader/r_6Z4JwVMhANtZFCIKEx7Q/NygMHUT8ewJO_7cXjYx8Ug

References